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ABSTRACT 

In analyzing exploratory repeated measures data with 
more than two measures, two competing tests must be administered 
simultaneously if one is to make an efficient and effective decision 
regarding the tenability of the null hypothesis of no differences 
among measurement means. Obviously, such a procedure is not without a 
cost vis-a-vis Type I error control. This study represents a measure 
of that cost. The Type I error properties for the simultaneous 
application of the mixed model test and the multivariate test were 
estimated- The results of this Monte Carlo robustness study suggest 
that a single rule of thumb designed to control Type I error (i.e., 
split the alpha or, alternatively, do not split the alpha) is not 
practical under all circumstances. A more dynamic method for 
satisfactory management of Type I errors, which deals with departures 
from sphericity and the related effects on correlation of the two 
tests, is outlined. A 28- item list of references and five data tables 
are included. (Author/TJH) 
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Abstract 

In analyzing exploratory repeated measures data with more than 
two measures, two competing tests must be administered 
simultaneously if one is to make an efficient and effective decision 
regarding the tenability of the null hypothesis of no differences 
among the measurement means. Obviously, such a procedure is not 
without a cost vis-a-vis Type I error control. This study 
represents a measure of that cost. The simulation results reported 
here suggest that a single rule of thumb designed to control Type I 
error (i.e., split the a or, alternatively, don't split the a ) is 
not practical under all circumstances. A more dynamic method for 
the satisfactory management of Type I error is reported. 
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Type I Error for the Simultaneous Application 
of Two Tests for Repeated Measures Data 
Introduction 

Behavioral scientists often use one or another variation of the 
repeated measures research design to make decisions concerning 
behavioral and psychological data (Edgington, 1974; Jennings and 
Wood, 1976; Lana and Lubin, 1963). In an exploratory investigation 
where specific a priori contrasts cannot be reasonably formulated, a 
researcher must depend upon an omnibus F statistic for a decision 
regarding the presence or absence of a treatment effect. An 
analysis alternative in this situation is the mixed model analysis 
of variance (Scheffe'', 1959). This particular analysis assumes, 
imong other things, a mathematical property known as sphericity 
(Huynh and Feldt, 1970). 

Huynh and Feldt (1970) and Rouanet and Lepine (1970) showed 
that sphericity is necessary and sufficient for the ratio of mixed 
model variances to be distributed as F. Huynh and Feldt (1970) 
referred to this condition as sphericity while Rouanet and Lepine 
(1970) used the term circularity . Departures from sphericity are 
indexed by the value of s which is well known (Box, 1954; Geisser 
and Greenhouse, 1958; Imhof, 1962). 

Unfortunately, the mixed model test is not robust with respect 
to even small departures from sphericity (Huynh, 1978). Departures 
from sphericity cause the test to be positively biased. Moreover, 
behavioral and psychophysiological data almost certainly depart from 
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sphericity (Kesslman and Rogan, 1980). Several correction factors 
have been proposed to remedy this problem. In general, the 
correction factors are fractions which when applied to the mixed 
model degrees of freedom cause the test to approximate its 
theoretical null distribution. The correction factor used in this 
paper ( z ) was proposed by Greenhouse and Geisser (1959). 

A second analysis alternative for this type of data does not 
assume sphericity. This test evaluates the same null hypothesis, 
but is conducted as a multivariate analysis of variance (Bock, 
1975). The two algorithms, the adjusted mixed model and the 
multivariate model, differ sufficiently to cause them not to be 
interchangeable. In what follows, these differences are reviewed. 
The Statistical Power Differential 

The general form of the multivariate null hypoth'esis for a 
design with g groups, k repeated measures, and one dependent 
variable per occasion, is: 

H^: ABC = D 

where A is a g-l x g contrast matrix representing the between group 
hypothesis, B is a g x k matrix of cell means, and C is a k-1 x k 
contrast matrix representing the within factor hypothesis (Timm, 
1S75). In the single group repeated measures design there is no 
between group hypothesis, therefore, the matrix A is a scalar set at 
unity. The matrix D is typically a null raatrix. 

The multivariate sums of squares and cross products matrices 
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for the hypothesis (H) and error (E) are given by: 



H = CB'(X'X)^-^BC 



and 



(1) 



E = C(y-Y) '(Y-Y)C 



(2) 



where B is a 1 x k vector of the treatment means over the k 
occasions, and X is a design matrix. Here, X is an n x 1 vector of 
ones, where n is the number of subjects. The matrix C is the 
k-1 X k contrast matrix, and Y is an n x k matrix containing the n 
vectors of observations. The matrix Y is defined as Y = XB. 

The omnibus multivariate repeated measures hypothesis can be 
tested using 



where A is Wilks's likelihood ratio '-riterion (Wilks, 1932) with 
k"l, 1, and n-k+1 degrees of freedom, and |E! denotes the 
determinant of the matrix E. A multivariate F statistic can be 
obtained by 



where are the k-1 hypothesis degrees of freedom, and ^2 ^^e 

n-k+1 error degrees of freedom. 

The F statistic for the multivariate approach to repeated 
measures given in Equation 4 is invariant to the linear contrasts in 
the matrix C. That is, any combination of k-1 linearly independent, 
contrasts in the matrix C will yield the same test statistic. If 
the contrasts in C are row-wise orthonormal, the omnibus mixed model 
test statistic can be obtained from Equations 1 and 2 as well. The 



A = |E| / IE + HI 



(3) 



F = [(1- A)/ A] (y^/v2) 



(4) 
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omnibus mixed model test statistic obtained through the multivariate 
model is given by 

F = [tr(H) / ] / [tr(E) / c^^ ] , (5) 

where tr is the trace of a matrix, q^ are the k-1 hypothesis degrees 
of freedom, and q2 are the (k-l)(n-l) error degrees of freedom. 

When k = 2 the mixed model and multivariate model tests are 
identical. However, when k > 2, the two tests can differ 
substantially. When k > 2 and e = 1, the mixed model test is always 
more powerful considering its greater number of error degrees of 
freedom. When k > 2 and e 7^ 1, the power of the two tests is 
determined by the pattern of mean differences relative to the 
structure of the variance-covariance matrix. 

Consider the following explanation. Recall that B is a row 
vector of the k means. The elements in the vector given by BC are 
the sum of the differences among the repeated measures found by each 
contrast in C. The k7l elements of this vector are the contrast 

effects. When they are squared, the contra^c effects are referred 

2 

to with the following notation il) (i = 1, 2, ... k-1). 

Let L be a matrix with component column vectors which are the 
eigenvectors of CZC , where Z is the k x k variance-covariance 
matrix. When the orthonormal contrast matrix C is premultiplied by 
by L' , CZC will be a diagonal matrix with the eigenvalues ( X ) 
along the diagonal. In other words, CZC is reduced to its 
canonical form with uncorrelated contrast variances along the 
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diagonal. These variances are referred to with the notation a^. , 

L 

where o^^ = CZC«(i,i), (i = 1, 2, ... Note that each 

contrast effect, tp ^, is associated with its contrast variance, a^. . 

^ A 

Building upon Imhof (1962) and upon Davidson (1972), Barcikowski 

and Robey (1984a) explained the statistical power difference between 

the mixed model test and the multivariate test by examining their 

respective noncentrality parameters. A brief review follows. 

With the above restrictions on C, the noncentrality parameter 

for the mixed model (6 ) is given by the sum of the k-1 mixed 
model contrast noncentrality parameters. The noncentrality 

parameter for the ith mixed model contrast ( d^.. ) is given by: 



2 n(k-l)t^2 
D=l ^3 



Similarly, the multivariate noncentrality parameter { Z ) is given 
by the sum of the k-1 multivariate contrast noncentrality 

parameters. The noncentrality parameter for the ith multivariate 

2 

model contrast ( g ^ ) is given by: 



Type I Error and Repeated Measures Tests 

8 



When one (or some) ccmbination(s) of the k means (i.e., i// ^ ) 

represents most of the treatment effect and when that same 
combination accounts for a relatively large portion of the treatment 

by subjects interaction effect (i.e., a^. ), the mixed model test 

i 

dominates in terms of power. On the other hand, when that same 
contrast accounts for a relatively small portion of the treatment by 
subjects interaction effect, the multivariate test dominates in 
terms of power. That is, the pooling of errors in the denominator 
2 

of 6 will wash out an isolated treatment effect when substantial 
experimental error exists that is not associated with that treatment 
effect. 

Davidson (1972) found that the difference in power between the 
two tests was most noticable when small treatment effects were 

oupled with small n's (i.e., n exceeds k by no more than a few.) 
Barcikowski and Robey (1984a) noted that small n»s and small 
treatment effects were likely occurrences in exploratory 
investigations. 

Exploratory investigations are those experiments conducted when 
a researcher has little prior information, experience or both with 
which to design a study. In a repeated measures research design, 
this means that a researcher has no way of identifying a priori 
which of the two repeated measures tests will be more powerful when 
applied to the observations. That is, in an exploratory situation, 
it is impossisble to make valid estimations of the above elements in 
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order to complete the necessary calculations for determining the 
test of choice. It is this research design dilemma that motivated 
the the present study. 

Purpose 

Barcikowski and Robey (1984a, 1984b) and Robey and Barcikowski 
(1984, 1987) addressed this dilemma by advocating simultaneous 
application of the adjusted mixed model test and the multivariate 
model test to evaluate the tenabili^y of the omnibus null hypothesis 
in exploratory experiments. While this advice is sound with respect 
to Type II error control, it is problematic with respect to the 
maintenance of Type I error control. As a further step, Barcikowski 
and Robey (1984a, 1984b and elsewhere) have suggested splitting the 
Type I error tolerance ( a ) equally between the two tests. This is 
probably a conservative procedure given that the Type I error rate 

for two simultaneously conducted independent tests is l-(l-a)^, or 
approximately 2a. Others have informally suggested that since the 
two tests are not independent, one might as well use the same a for 
both tests. 

The purpose of the present investigation was to estimate the 
Type I error properties for the simultaneous application of the 
mixed model test and the multivariate test. The mixed model test 
was here adjusted for e. The research design selected for 
examination was the single group repeated measures design. A Monte 
Carlo technique was employed be cause the mathematical derivation of 
the distribution for this procedure is precluded by the complexity 
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resulting from two separate error teiiH^ and two separate error 
degrees of freedom. 

Methods 

The independent variables in this Monte Carlo robustness study 
were: the magnitude of the departure from sphericity; the nunber of 
measurement occasions (k); and the number of observation units (n). 
The magnitude of the departure from sphericity was varied at the 
following levels: no departure (e.i., e = 1), slight departure 
(i.e., e = .9), moderate departure (i.e., s = .75), severe departure 
(i.e., e = .5), and a maximal departure (i.e., e = l/(k-l)). The 
number of occasions in a single group repeated measures design was 
varied at 3, 5, 7 and 10. The number of observations in the design 
was varied at (k-1) + 3, (k-1) + 10, (k-1) + 20, and (k-1) + 30. 
Thus, the research design under investigation was represented by 
reasonable ranges of departures from each of sphericity, of design 
size and of sample size. 

The dependent variable in this experiment was the proportion 
( TT ) of incorrect rejections of the null hypothesis as indicated by 
at least one of the two algorithms. 

A FORTRAN subroutine, DRNMVN, from the International 
Mathematical and Statistical Libraries, Inc. (IMSL, 1987) j/as used 
to generate multivariate normal data for each variance-covariance 
matrix. The analysis program was a double precision G level 
VS FORTRAN program. 
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Statistical Hypotheses 

The general form of the null and alternate hypotheses were 
H^: IT = a and H^: it 7^ a. Here, it represents the population 

proportion of tests which exceed a critical value, and a represents 
nominal Type I error. Nominal Type I error was examined at ,01 and 
•05. Bradley's (1978) robustness criterion of a±a(0.5) was adopted 
as a definition for adequate Type I error performances. Thus, 
departures of + .005 from a = .01, and departures of + .025 from 
a = .05 were defined as meaningful. A two-tailed test for 
proportions described by Cohen (1977, p. 213) was used to analyze 
the obtained results. The a priori a for all applications of the 
proportions test was set at .01. The desired minimal statistical 
power for all applications of the proportions test was set at .80. 
Following the method for establishing sample size described by Cohen 
(1977), it was determined that 5711 observations were needed to 
evaluate H^: it = .01, and that 1085 observations were needed to 

evaluate H^: it = .05. 

Results 

The results of this experiment can be found in Tables 1 
through 5. In Table 1 it can be seen that when the sphericity 
assumption is met, the simultaneous test procedure yields 
acceptable, albeit slightly elevated. Type I error rates. In 
general, the same pattern can be seen in Table 2 which contains the 
observed Type I error rates for the slight departure from sphericity 
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(i.e., z = .9) condition. 

Table 3 contains the it values observed under the moderate 
departure from sphericity (i.e.,. z = .75) condition. This table 
contains seven unacceptably high values at the a = .01 level and one 
r.nacceptably high value at the a = .05 level. While these eight 
values exceed their respective a values, none of them represents a 
drastic departure. The situation is more serious under a severe 
departure from sphericity (i.e., s •= .5). More than half of the 
observed Type I error rates in Table 4 exceed a. The mean of the 
unacceptable it's is approximately 1.6a at both levels. 

Table 5 contains the results for the condition of uiaximal 
departure from sphericity (i.e., z = l/(k-l)). Every observed 
Type I error rate reported m Table 5 exceeds a. In fact, as k 
increases, it approaches 2a. Note that the values reported in Table 5 
for the k=3 condition are the same values reported in Table 4. That 
is, e = .5 in both cases. 

Discussion 

The pattern characterizing these results across the various 
departures from sphericity is clear. As the value of z decreases 
the values of it increase. This pattern is probably best explained 
by the cor»"elation between the two tests. When the sphericity 
assumption is met or nearly met the two tests are maximally 
correlated. It can be seen that when £ = 1, the two noncentrality 
parameters are equal. As the magnitude of the departure from 
sphericity increases, the two tests become less well correlated. As 
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can be seen in Table 5, ir approaches l-(l-a) when z = l/(k-l), 
particularly for the greater levels of k. 

The question to be clarified by these results is, "What 
magnitude of Type I error tolerance should be split equally between 
the two repeated measures tests?" To date, answers to this question 
have ranged from a (i.e., give each test one half of a ) to 2a 
(i.e., give each test the full measure of a ) . 

The results repot ced here suggest that the former tack is 
reasonable when the sphericity assumption is, or is nearly, met. 
However, when the data are characterized by a very severe departure 
from the sphericity assumption, the latter approach is indicated. 
Neither tack is particularly appropri^^te under a moderate departure 
from sphericity. If one assumes on the basis of Huynh and Feldt 
(1976) that behavioral data usually do not fall below z = .75, then 
the liberal decision appears the more attractive in terms of general 
merit. 

The dilemma, however, remains. The fact that the magnitude of 
the departure from sphericity cannot be estimated a priori in an 
exploratory study precludes prudent application of either tack. For 
this reason \ve recommend the following. 

1. Treat the value of e as a descriptive statistic. 

2. Set a for the simultaneous test procedure somewhere 
between la and 2a on the basis of the magnitude of the 
departure from sphericity. 

3. That error tolerance should then be divided equally 
between the two tests. 
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Accordingly, the Type I error tolerance for the simultaneous test 
procedure would approach 2a as the value of t increases. Under the 
worst of all circumstances vis-a-vis sphericity, the Type I error 
tolerance for the simultaneous test procedure would be set at, or 
near, la. 

An aspect of this strategy which deserves comment is the fact 
that a sample estimate becomes the linchpin for a decision regarding 
an inferential test. While this practice is often cause lor • 
concern, here we note the results of several simulation studies that 
have compared the z adjusted mixed model test to the s adjusted 
mixed model test in terms of Type I and Type II error performance 
(Collier, Baker, Mandeville and Hayes, 1967; Mendoza, Toothaker, and 
Nicewander, 1974; Stollof, 1970; and Wilson, 1975). In each of 
these studies, the estimate perfomed very much like the parameter. 
As a result, practitioners now routinely adjust the mixed model 
degrees of freedom with z rather than s. It would seem then that z 
represents relatively stable and unbiased estimate. 

Subsequent simulations have indicated that setting the value in 
step #2 at 1.7 has the desired effect in terms of Type I error 
control when e = .75. Moreover, values of 1.25 and 1 have worked 
well for 8 = .5 and e = l/(k-l), respectively. Researchers who 
prefer a more stringent definition of Type I error tolerance than 
that used here may want to set the value in step #2 at 1.8 or so, 
even when the sphericity assumption is nearly met. 
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Conclusion 

Barcikowski and Robey (1984a, 1984b) and Robey and 
Barcikowski (1984, 1987) have related a strategy for analyzing 
exploratory repeated measures data which effectively manages Type II 
error. Their strategy affords the application scientist confidence 
with respect to the detection of false omnibus null hypotheses. The 
results reported here provide the application scientist with similar 
confidence vis-a-vis the management of Type I error when analyzing 
repeated measures data. 
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TABLE 1 

Observed Type I Error Rates for the Simultaneous Application 
of Both Tests When Sphericity Exists 





n 


k = 3 


k = 5 


k = 7 


k = 10 



k-1+3 


.013 


.010 


.011 


.010 




.073 


.064 


.068 


.065 


k-1+10 


.013 


.010 


.015 


.014 




.064 


.063 


.067 


.071 


k-1+20 


.013 


.014 


.012 


.016* 




.062 


.063 


.064 


.070 


k-1+30 


.010 


.016* 


.012 


.014 




.060 


.066 


.064 


.064 



Note. The double entries for each Monte Carlo problem 
represent a at .01 (top), and at .05 (bottom). An asterisk 
indicates a significant (£ < .01) and a meaningful departure 
from a. 
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TABLE 2 

Observed Type I Error Rates for the Simultaneous Appication 
of Both Tests When e =• .9 





n 


k - 3 


k = 5 


k = 7 


k = 10 



k-1+3 


.014 


.014 


.012 


.012 




.0.73 


.070 


.068 


.064 


k-1+10 


.015 


.014 


.013 


.015 




.068 


.073 


.071 


.072 


k-1+20 


.016* 


.015 


.016* 


.015 




.062 


.069 


.070 


.064 


k-1+30 


.017* 


.014 


.013 


.012 




.067 


.067 


.063 


.065 



Note. The double entries for each Monte Carlo problem 
represent a at .01 (top), and at .05 (bottom). An asterisk 
indicates a significant (g < .CI) and a meaningful departure 
from a. 
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TABLE 3 

Observed Type I Error Rates for the Simultaneous Appication 
of Both Tests When e = .75 





n 


k = 3 


k = 5 


k = 7 


k = 10 



k-1+3 


.016* 


.015 


.012 


.011 




.071 


.077* 


.064 


.068 


k-1+10 


.016* 


.016* 


.016* 


.014 




.068 


.075 


.067 


.072 


k-1+20 


.018* 


.018* 


.013 


.015 




.072 


.072 


.072 


.071 


k-1+30 


.015 


.017* 


.015 


.014 




.071 


.070 


.069 


.068 



Note. The double entries for each Monte Carlo problem 
represent a at .01 (top), and at .05 (bottom). An asterisk 
indicates a significant (g < .01) and a meaningful departure 
from a. 
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TABLE 4 



Observed Type 


I Error 


Rates for the Simultaneous Appication 


of Both Tests 


When E = 


= .5 








n 


k = 3 


k = 5 


k = 7 


k =: 10 


k-1+3 


.017* 


.013 


.015 


.013 




.079* 


.071 


.080* 


.077* 


k-1+10 


.016* 


.014 


.012 


.015 




.086* 


.074 


.074 


.079* 


k-1+20 


.017* 


.017* 


.016* 


.016* 




.075 


.077* 


.074 


.079* 


k-1+30 


.018* 


.016* 


.016* 


.014 




.081* 


.074 


.080* 


.075 



Note . The double entries for each Monte Carlo problem 
represent a at .01 (top), and at .05 (bottom). An asterisk 
indicates a significant (g < .01) and a meaningful departure 
from a. 
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TftBLE 5 

Observed Type I Error Rates for the Simultaneous Appication 
of Both Tests Under a Maximal Departure from Sphericity 





n 


k = 3 


k = 5 


k = 7 


k = 10 


k-1+3 


.017* 


.018* 


.019* 


.020* 




.079* 


.088* 


.090* 


.098* 


k-l+lO 


.016* 


.017* 


.019* 


.019* 




.086* 


.095* 


.088* 


.094* 


k-1+20 


.017* 


.019* 


.020* 


.020* 




.075* 


.086* 


.087* 


.089* 


k-l+30 


.018* 


.018* 


.019* 


.020* 


.081* 


.085* 


.091* 


.090* 


Note. 


The double entries for e?^ch Honte Carlo problem 


represent a 


at .01 (top) , and at 


.05 (bottom). 


An asterisk 


indicates a 


significant 


(E < .01) 


and a meaningful departure 



from a. 
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